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2  Project  Summary 

The  goal  of  this  research  was  to  develop  a  hybrid  real-time  problem-solving  architecture  that  couples 
symbolic  planning  methods  with  connectionist  reinforcement  learning  methods.  The  advantage  of 
this  hybrid  architecture  is  that  it  can  immediately  achieve  reasonable  performance,  because  the 
symbolic  planning  system  can  quickly  develop  an  acceptable  control  policy,  but  it  can  also  gradually 
achieve  optimal  real-time  performance,  because  the  reinforcement  learning  system  will  eventually 
converge  on  a  near-optimal  policy.  Many  DoD  problems  would  benefit  firom  the  ability  to  perform 
near-optimal  real-time  control  of  complex  systems. 

3  Accomplishments 

•  Developed  the  ALERT  hybrid  architecture  which  combines  symbolic  (DRULE)  planner  with 
hierarchical  reinforcement  learning 

•  Showed  experimentally  that  the  DRULE  planner  could  achieve  human-level  performance  on 
the  Kanfer- Ackerman  air  traffic  control  (ATC)  task. 

•  Developed  two  learning  algorithms  for  DRULES:  one  based  on  random  examples  and  queries, 
and  the  other  based  on  exercises. 

•  Showed  experimentally  that  both  learning  algorithms  could  achieve  intermediate  performance 
on  the  ATC  task. 

•  Proved  that  both  learning  algorithms  are  correct  and  computationally  feasible.  This  involved 
proving  a  new  result  on  learning  of  Horn  clause  logic  programs. 

•  Developed  a  new,  hierarchical  method  for  reinforcement  learning,  the  MAXQ  method. 

•  Proved  that  MAXQ  can  represent  any  hierarchical  policy. 

•  Developed  the  MAXQ-Q  learning  algorithm  for  hierarchical  reinforcement  learning. 

•  Proved  that  MAXQ-Q  converges  to  a  recursively  optimal  policy  asymptotically. 

•  Demonstrated  experimentally  that  MAXQ-Q  attains  optimal  performance  on  a  simplified 
task  that  shares  many  properties  with  the  ATC  task. 

4  Transitions 

We  are  currently  working  with  i2  Technologies  (Dallas,  Texas)  to  apply  our  reinforcement  learning 
methods  to  supply  chain  scheduling  and  optimization. 
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